Detecting translated open reading frames (ORFs) within Ribo-Seq data often requires evaluating the reads within your captured ORFs to determine if typical properties of translation, such as triplet periodicity are present. inspectorORF aims to simplify the task of plotting ORFs, with the following features:
Install Bioconductor dependencies -
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("plyranges", "bioseq", "Biostrings", "GenomicRanges", "rtracklayer"))
Load the inspectorORF library -
library(inspectorORF)
inspectorORF requires genomic sequences to be in 2bit format. Your genome FASTA of interest can be converted to with UCSC’s TwoBit tool faToTwoBit as follows:
faToTwoBit gencode.v44.genome.fasta gencode.v44.genome.2bit
Additionally, coverage of the RNA-Seq data needs to be calculated.
This can be performed with bedtools, using
bedtools coverage:
bedtools coverage -abam sample_alignment.bam -b gencode.v44.annotation.gtf > control_rna_tracks.bed
gzip control_rna_tracks.bed # optional, but recommended, especially if dealing with large genomes.
Alternative coverage tools can also be used, as long as the resulting bed files consists of chromosome, start, end, strand and score columns, where each start and end position is incremental, or chromosome, start, end, [ignored], [ignored], strand, position, score where position is incremental. Columns must be in those orders
A note if using RiboTaper data - RNA-Seq coverage is
performed as part of the RiboTaper pipeline and therefore, the
bedtools coverage step is not required.
Before usage, reads from either ORFquant and bedtoolsbedtools or RiboTaper need to be imported and processed.
As RiboTaper
calculates both the RNA-Seq coverage and P-sites for you, simply using
inspectorORF::import_tracks_from_RiboTaper() with one or
more of the Psit_Ribo_Rna_Cent_tracks_* files found within the
tracks_data folder of the RiboTaper results.
Please note, no example data is provided for running an
example with RiboTaper data, and is simply provided to demonstrate how
to create a tracks object from RiboTaper data. Additionally,
inspectorORF::import_tracks_from_RiboTaper only supports
one sample and therefore, multiple conditions, replicates, or additional
data can not be supplied when importing from RiboTaper data. For
plotting additional conditions or extra genomic data, the
inspectorORF::merge_tracks_with_ORFquant function is
required.
# No such RiboTaper data is provided for running as an
# example, all example data focuses on ORFquant since
# this is the successor to RiboTaper.
tracks <- inspectorORF::import_tracks_from_RiboTaper(
c("Psit_Ribo_Rna_Cent_tracks_ccds",
"Psit_Ribo_Rna_Cent_tracks_nonccds")
)
Data from ORFquant can also be utilised. RNA-Seq coverage previously
calculated with bedtools coverage and the P-sites
calculated by ORFquant can be merged with the
inspectorORF::merge_tracks_with_ORFquant function as
follows, which will return a GRanges object.
All example data has been processed from GEO accession GSE16214, aligned to Gencode V44 and subset for the NTN4 gene.
tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF")
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Merging RNA-Seq reads with P-Sites"
If we evaluate the first 6 rows of the returned
genome_tracks object, this shows the reads we have at each
chromosome coordinate:
datatable(as.data.frame(tracks), options = list(pageLength = 10))
If you wish to add additional conditions, such as a treated
condition, the extra and extra_orfquant_psites
parameters can be utilised. These options expect a named list, where
each element consists of the file name of the data to be added. Any
names within the list can be used, as this information will be utilised
to created an additional column name.
For example, if we have a treated condition consisting of RNA-Seq reads and calculated P-sites, the following example would create create tracks for multiple conditions:
tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF"),
extra = list(treated_rna_reads = system.file("example_data", "treated_rna_tracks.bed.gz", package = "inspectorORF")),
extra_orfquant_psites = list(treated_psites = system.file("example_data", "treated_psites_for_ORFquant", package = "inspectorORF"))
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Importing data for treated_rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Importing data for treated_psites"
#> [1] "Merging RNA-Seq reads with P-Sites"
# Optionally save the GRanges data into an RData file
save(tracks, file = "tracks.RData")
Addtional track information from the treated RNA-Seq and ORFquant
data can be seen when we evaluate the tracks with head:
datatable(as.data.frame(tracks), options = list(pageLength = 10))
To plot reads from a transcript or ORF, reads associated with either genes(s) or transcript(s) of interest must be obtained. Gene level tracks requires additional steps for plotting but has the added benefit of being able to plot differing transcript variants of interest.
The example data provided consists of reads from RNA-Seq and P-Sites calculated from Ribo-Seq data (processed with ORFquant) for the NTN4 gene from undifferentiated (control) and differentiated (ra-treated) SH-SY5Y cells. To obtain gene level tracks for the human gene netrin 4 (NTN4, gene id ENSG00000074527.13), save the data as a bed file for future importing and obtain transcript level tracks for the transcript variant ENST00000343702.9 for plotting can be performed as follows:
# Obtain gene level tracks for the NTN4 gene - ENSG00000074527.13
gene_tracks <- inspectorORF::get_gene_tracks(
tracks,
gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
gene_ids = c("ENSG00000074527.13"),
framed_tracks = c("p_sites", "treated_psites") # required when multiple ORFquant data us provided
)
# Obtain tracks for specific transcript variants of the NTN4 gene.
tx_tracks_from_gene <- inspectorORF::gene_to_transcript_tracks(
gene_tracks,
transcript_filter = c("ENST00000343702.9") # multiple transcript ids supported
)
# Save the gene level track information as a bed file for future use.
inspectorORF::save_tracks(gene_tracks, "NTN4_gene_tracks.bed")
Alternatively, tracks for just the NTN4 transcript ENST00000343702.9 can be directly obtained and saved, requiring fewer steps but will need to be repeated if alternative transcripts of NTN4 wish to be plot in future.
tx_tracks <- inspectorORF::get_transcript_tracks(
tracks,
gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
transcript_ids = c("ENST00000343702.9"), # multiple transcript ids supported
framed_tracks = c("p_sites", "treated_psites") # required when multiple ORFquant data us provided
)
# Save the transcript level track information as a bed file for future use.
inspectorORF::save_tracks(tx_tracks, "NTN4_ENST00000343702_tracks.bed")
A simple transcript plot can be made by calling
inspectorORF::transcript_plot on our tx_tracks
object, requesting the transcript of interest
A plot of reads across a full transcript of interest can then be
performed the inspectorORF::transcript_plot. The RNA-Seq
coverage being peaks around the P-Site coverage rather than a more
uniform coverage due to these reads being derived from
polysome-associated RNA-Seq data.
inspectorORF::transcript_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)
or, plot a transcript from tracks created by gene information originally:
inspectorORF::transcript_plot(
tx_tracks_from_gene,
transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)
Exons making up a transcript can also be plot by splitting the figure
into exons with the split_exons = TRUE option -
inspectorORF::transcript_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
split_exons = TRUE
)
As can be seen in the above plot, although both RNA-Seq reads and P-Sites are found across the transcript, such reads are limited and fairly low, suggesting any coding sequences found within this transcript to not be translated. The original ORFquant results for this dataset also reflect this, with no coding sequence within ENST00000343702.9 being flagged as translated.
The annotated coding sequence within ENST00000343702.9 can be plot to
get a better understanding the P-Site reads we do have, to see if they
are primarily within the coding sequence. This can be undertaken by
highlighting a region of interest by specifying a
start_position and stop_position.
Please note, no checks will be performed if start_position and stop_position are true start and stop codons, or if they’re within the same frame. Plots will be given regardless.
inspectorORF::transcript_plot(
tx_tracks,
start_position = 456,
stop_position = 2339,
transcript_filter = "ENST00000343702.9" # Requires 1 transcript at a time
)
If RNA-Seq reads have a much higher depth to the Ribo-Seq/P-site
information, such as with poorly translated lncRNAs, the RNA-Seq reads
can be capped to the highest depth P-Site read by setting
scale_to_psites = TRUE
inspectorORF::transcript_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
start_position = 456,
stop_position = 2339,
scale_to_psites = T
)
Again, although reads are within the highlighted coding sequence, such reads are low, making it unlikely that this coding sequence is truly translated.
Evaluating P-sites and triplet periodicity of reads outside of the ORF compared to the triplet periodicity within the ORF () can be helpful to understand the likelihood the ORF is being translated. As such, if triplet periodicity is present within the ORF, with reads primarily consisting of the ORFs frame, but triplet periodicity is lacking within the remainder of the transcript, this can give some confidence in the ORF truly being translated.
An additional triplet periodicity summary plot for the remainder of
the transcript can be enabled with the
summarise_outside_region = TRUE:
inspectorORF::transcript_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
summarise_outside_region = T
)
During gene or transcript track generation, reads within introns can
be retained with the keep_introns = TRUE to enable plotting
intronic reads later -
my_tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF")
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Merging RNA-Seq reads with P-Sites"
gene_tracks <- inspectorORF::get_gene_tracks(
my_tracks,
gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
keep_introns = TRUE,
gene_ids = c("ENSG00000074527.13")
)
# Any track generation from gene tracks will therefore contain intronic reads
tx_from_gene_tracks <- inspectorORF::gene_to_transcript_tracks(
gene_tracks,
transcript_filter = c("ENST00000343702.9") # multiple transcript ids supported
)
# Or, with get_transcript_tracks directly
my_tx_tracks <- inspectorORF::get_transcript_tracks(
my_tracks,
gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
transcript_ids = c("ENST00000343702.9"), # multiple transcript ids supported
keep_introns = TRUE,
framed_tracks = c("p_sites") # required when multiple ORFquant data us provided
)
Including and annotating introns within a plot can then be performed
by setting the split_exons parameter of transcript_plot to
“with_introns” -
inspectorORF::transcript_plot(
my_tx_tracks,
transcript_filter = "ENST00000343702.9", # Requires 1 transcript at a time
split_exons = "with_introns"
)
Helper functions are provided to search for ORFs and obtain the nucleotide sequence, amino acid sequence, and framing scores for each ORF frame.
These are not strictly required, as transcript_filter, start_position and optionally stop_position can be specified manually within the plotting functions.
If ORFs of interest aren’t known, or you wish to identify ORFs
(simply a start and stop, these functions do not consider reads to
detect translated ORFs. Tools exist for such analysis, such as ORFquant)
to evaluate reads, we use the inspectorORF::find_orfs
function to search for ORFs in a given transcript.
# Find ORFs in a specific transcript
orfs <- find_orfs(
tx_tracks,
transcript_filter = "ENST00000343702.9"
)
# Show the ORFs as a tibble, displaying start codon, stop codon, framing and reads present
datatable(as.data.frame(orfs), options = list(pageLength = 10))
Additional filter options can also be supplied to
inspectorORF::find_orfs, such as:
Additional helper functions exist to obtain both nucleotide sequences and amino acid residues of an ORF.
Nucleotide sequences are obtained with
inspectorORF::get_orf_nt_seq, requiring the transcript of
interest, and either an object from the list provided by
inspectorORF::find_orfs, or a start_position
parameter which will obtain the sequence up to the nearest in-frame stop
codon. Optionally, a stop_poisition parameter can be
supplied to override the detection of the nearest in-frame stop
codon.
orf_nt_sequence <- inspectorORF::get_orf_nt_seq(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456
)
print(orf_nt_sequence)
#> RNAStringSet object of length 1:
#> width seq names
#> [1] 1884 AUGGGGAGCUGCGCGCGGCUGCU...AUAUUUUAAAAAGAGAGUGCAAG ENST00000343702.9...
To obtain the amino acid residues of an ORF, the
inspectorORF::get_orf_aa_seq function is utilised,
accepting the same options as
inspectorORF::get_orf_nt_seq.
orf_aa_sequence <- inspectorORF::get_orf_aa_seq(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456
)
print(orf_aa_sequence)
#> AAStringSet object of length 1:
#> width seq names
#> [1] 628 MGSCARLLLLWGCTVVAAGLSGV...SFVQHWKPSLGRKVMDILKRECK ENST00000343702.9...
Plotting just an ORF an interest can be performed with
inspectorORF::orf_plot. This accepts similar options to
inspectorORF::transcript_plot
Plotting ORFs can either be undertaken by supplying an object ORF
position identified by inspectorORF::find_orfs,
inspectorORF::orf_plot(
tx_tracks,
orf_object = inspectorORF::get_orf(orfs, start_position = 456)
)
Or by supplying the transcript_id, start_position and stop_position positions directly.
Please note, no checks will be performed if orf_start and orf_stop are true start and stop codons, or if they’re within the same frame. Plots will be given regardless.
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456
)
Reads outside of the ORF of interest can also be included by setting
plot_region = c(-1, -1) which will plot these regions as 5’
and 3’.
However, this can be particularly useful to plot a portion of the
untranslated region upstream, downstream or both, of the ORF. this can
be included in the plot with plot_region as specific sizes.
These values are relative to the provided start and stop positions. E.G.
plot_region = c(500, NA),
plot_region = c(NA, 500), and
plot_region = c(500, NA) will include 50nt upstream, 50nt
downstream, and 50nt both upstream and downstream to the ORF
requested.
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
plot_region = c(500, 500)
)
If the data was prepared with either multiple conditions or multiple replicates, inspectorORF supports plotting multiple figures of the same ORF for more than one condition. For example:
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
# Set two names for Control and Treated condition
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
# Declare names for the RNA and P-Site reads
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites"),
# State the colours for the additional treated_rna_reads
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725")
)
This plot shows that coverage for both RNA-Seq and P-Sites are significantly higher and spread across the coding sequence of interested in our treated (differentiated) cells. Suggesting the coding sequence is translated.
This reflects what ORFquant reports too, with this coding region being flagged as translated -
| seqnames | start | end | width | strand |
|---|---|---|---|---|
| ENST00000343702.9 | 456 | 2339 | 1884 | - |
Additional data can be supplied at either the generation of genome
tracks with additional information such as reads from long-read
sequencing, or at the transcript level of track extraction with the
additional_info option of
inspectorORF::get_transcript_tracks such as calculated
Kozak scores or transcript level mapped proteomics data.
# Create the main tracks information consisting of data prepared from long-read sequencing.
tracks <- inspectorORF::merge_RNA_tracks_with_ORFquant(
rna_reads = system.file("example_data", "control_rna_tracks.bed.gz", package = "inspectorORF"),
orfquant_psites = system.file("example_data", "control_psites_for_ORFquant", package = "inspectorORF"),
extra = list(treated_rna_reads = system.file("example_data", "treated_rna_tracks.bed.gz", package = "inspectorORF"),
# Control dRNA tracks not provided due to such reads being absent within the control cells.
treated_dRNA_reads = system.file("example_data", "treated_dRNA_tracks.bed.gz", package = "inspectorORF")),
extra_orfquant_psites = list(treated_psites = system.file("example_data", "treated_psites_for_ORFquant", package = "inspectorORF"))
)
#> [1] "Reading in RNA-Seq data"
#> [1] "Importing data for rna_reads"
#> [1] "Importing data for treated_rna_reads"
#> [1] "Importing data for treated_dRNA_reads"
#> [1] "Reading in ORFquant P-Site data"
#> [1] "Importing data for p_sites"
#> [1] "Importing data for treated_psites"
#> [1] "Merging RNA-Seq reads with P-Sites"
Supplying additional transcript level information requires the data to be in the form of a data frame (or something which can be cohersed into a data frame) with the following layout.
| transcript_id | position | custom_column |
|---|---|---|
| ENST1… | 5 | 0.5 |
| ENST1… | 6 | 0.8 |
| ENST2… | 2 | 1.0 |
Multiple custom columns can be supplied and the values within are required to be numeric, leading to an additional bar graph for each custom column found.
For example, Kozak scores were previously calculated in-frame AUG codons for our transcript of interest (using a field labelled KSS) along with proteomics evidence (using a field labelled MS)
NTN4_KSS_and_MS <- read_csv(system.file("example_data", "NTN4_additional_tx_data.csv", package = "inspectorORF"))
#> Rows: 3882 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): transcript_id
#> dbl (3): position, KSS, MS
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
datatable(NTN4_KSS_and_MS, options = list(pageLength = 10))
Kozak scores can be calculated using ORFik or a webservice, for example https://www.tispredictor.com/kss. In the case of the proteomics data, we have set regions of a transcript with no proteomic evidence as zeros and regions of transcript where a peptide hit was present as values depending on how many overlapping hits are identified, which will result in solid blocks of colour for such regions.
Finally, the transcript tracks for the additional genomic and transcriptomic level data can be obtained and plotted.
# Obtain transcript tracks for ENST00000343702.9 along with our additional scores
tx_tracks <- inspectorORF::get_transcript_tracks(
tracks,
gtf_file = system.file("example_data", "annotation_subset.gtf", package = "inspectorORF"),
genome_file = system.file("example_data", "chr12.2bit", package = "inspectorORF"),
transcript_ids = c("ENST00000343702.9"),
additional_info = NTN4_KSS_and_MS,
framed_tracks = c("p_sites", "treated_psites")
)
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725")
)
Codons can be labelled on the plot in order to evaluate alternative start codons that may be present within the transcript or ORF. This is particularly useful if there are upstream start codons which may act as an alternative initiation site, or for the analysis of near-cognate start codons (E.G CUG, or GUG).
Therefore, additional options are supplied within inspectorORF to annotate the following:
Annotating such codons is performed by supplying a list of queries
with the codon_queries parameter of
inspectorORF::orf_plot, specifying what codon(s) to plot,
their framing, and colour.
E.G, to label the ORFs start codon and in frame only canonical (AUG) start codons:
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(annotate_start = T, colour = "black"),
inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred")
)
)
Additionally, codons can be coloured based on the frame the codon is found in, by setting “use_framing” as the colour option. Allowing out of frame codons can be useful here too, although in this case, this results in a fairly noisy plot:
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(annotation_codons = c("AUG"), colour = "use_framing")
)
)
As can be seen in the plot, there are many AUG start codons within the ORF, consisting of various frames. Such codons can also be plotted within the UTRs, to evaluate potential upstream alternative initiation sites. Or we can evaluate in-frame codons, across the whole transcript:
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
summarise_outside_orf = T,
plot_region = c(-1, -1),
text_size = 14,
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated\ndRNA-Seq\nReads",
"KSS" = "AUG Kozak\nscores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c(
"rna_reads" = "grey70",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "#999900",
"KSS" = "#008080",
"MS" = "black",
"0" = "#B12A90",
"1" = "#0072B2",
"2" = "#E66100"
),
codon_queries = list(
inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "use_framing")
)
)
Two observations can be made from this plot. First of all, there are no upstream in-frame AUG start codons, suggesting initiation does occur at the start codon described for the coding seauence. Secondly, even if we evaluate alternative start codons, although two upstream in-frame CUG codons are present, there is zero evidence of these being utilised for initiation due to the lack of P-Site coverage downstream to these.
For example, to include in-frame upstream AUG and the near cognate codon CUG:
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
# Add 5'UTR region from start of transcript
plot_region = c(-1, NA),
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(annotate_start = T, colour = "black"),
inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred"),
inspectorORF::codon_query(annotation_codons = c("CUG"), in_frame = T, colour = "darkgreen")
)
)
When annotating codons, stop codons can also be included with
annotate_stop = TRUE. This can be particularly useful when
annotating in-frame start codons across the transcript to indicate
alternative ORFs which are possibly translated.
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
plot_region = c(-1, -1),
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(in_frame = T, annotate_stop = T, colour = "darkred")
)
)
By default, annotate_stop = TRUE will search for all
three possible stop codons (UAA, UAG and UGA).
annotate_stop also accepts a character vector to search for
specific stop codons instead -
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
plot_region = c(-1, -1),
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(in_frame = T, annotate_stop = c("UAG"), colour = "darkred")
)
)
Annotated codons can also be plot on their on track facet, which is
particularly useful in the case of lots of codons. This can be enabled
by naming the codon query with
annotation_label = "My Codons".
By default, all annotated codons will be within the same track facet titled ‘Labelled Codons’
inspectorORF::orf_plot(
tx_tracks,
transcript_filter = "ENST00000343702.9",
start_position = 456,
stop_position = 2339,
scale_to_psites = T,
# Add 5'UTR region from start of transcript
plot_region = c(-1, NA),
condition_names = c("rna_reads" = "Control",
"treated_rna_reads" = "Treated"),
plot_read_pairs = c("p_sites" = "rna_reads",
"treated_psites" = "treated_rna_reads"),
# Declare names for the RNA and P-Site reads, including the additional
# dRNA_reads, KSS and MS data for them to be included on the plot
dataset_names = c("rna_reads" = "RNA-Seq Reads",
"p_sites" = "P-Sites",
"treated_rna_reads" = "RNA-Seq Reads",
"treated_psites" = "P-Sites",
"treated_dRNA_reads" = "Treated dRNA-Seq Reads",
"KSS" = "AUG Kozak scores",
"MS" = "Proteomics"),
# State the colours for the additional data
plot_colours = c("rna_reads" = "grey60",
"treated_rna_reads" = "grey60",
"treated_dRNA_reads" = "orange",
"KSS" = "hotpink4",
"MS" = "black",
"0" = "#440854",
"1" = "#23A884",
"2" = "#FEE725"),
codon_queries = list(
inspectorORF::codon_query(annotate_start = T, colour = "black"),
inspectorORF::codon_query(annotation_codons = c("AUG"), in_frame = T, colour = "darkred"),
inspectorORF::codon_query(annotation_codons = c("CUG"), in_frame = T, colour = "darkgreen", annotation_label = "CUG Codons")
)
)
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS 26.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/London
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DT_0.33 lubridate_1.9.4 forcats_1.0.0
#> [4] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4
#> [7] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
#> [10] ggplot2_3.5.1 tidyverse_2.0.0 inspectorORF_1.1.0
#> [13] GenomicRanges_1.58.0 GenomeInfoDb_1.42.3 IRanges_2.40.1
#> [16] S4Vectors_0.44.0 BiocGenerics_0.52.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 farver_2.1.2
#> [3] R.utils_2.13.0 Biostrings_2.74.1
#> [5] bitops_1.0-9 fastmap_1.2.0
#> [7] RCurl_1.98-1.16 GenomicAlignments_1.42.0
#> [9] XML_3.99-0.18 digest_0.6.37
#> [11] timechange_0.3.0 lifecycle_1.0.4
#> [13] plyranges_1.26.0 magrittr_2.0.3
#> [15] compiler_4.4.2 rlang_1.1.5
#> [17] sass_0.4.9 tools_4.4.2
#> [19] yaml_2.3.10 data.table_1.16.4
#> [21] rtracklayer_1.66.0 knitr_1.49
#> [23] labeling_0.4.3 htmlwidgets_1.6.4
#> [25] S4Arrays_1.6.0 bit_4.5.0.1
#> [27] curl_6.2.0 DelayedArray_0.32.0
#> [29] abind_1.4-8 BiocParallel_1.40.0
#> [31] withr_3.0.2 R.oo_1.27.0
#> [33] grid_4.4.2 ggh4x_0.3.0
#> [35] colorspace_2.1-1 scales_1.3.0
#> [37] SummarizedExperiment_1.36.0 cli_3.6.4
#> [39] rmarkdown_2.29 crayon_1.5.3
#> [41] generics_0.1.3 rstudioapi_0.17.1
#> [43] httr_1.4.7 tzdb_0.4.0
#> [45] rjson_0.2.23 cachem_1.1.0
#> [47] zlibbioc_1.52.0 parallel_4.4.2
#> [49] XVector_0.46.0 restfulr_0.0.15
#> [51] matrixStats_1.5.0 vctrs_0.6.5
#> [53] Matrix_1.7-1 jsonlite_1.8.9
#> [55] hms_1.1.3 patchwork_1.3.0
#> [57] bit64_4.6.0-1 ggrepel_0.9.6
#> [59] bioseq_0.1.5 crosstalk_1.2.1
#> [61] jquerylib_0.1.4 glue_1.8.0
#> [63] minidown_0.4.0 codetools_0.2-20
#> [65] stringi_1.8.4 gtable_0.3.6
#> [67] BiocIO_1.16.0 UCSC.utils_1.2.0
#> [69] munsell_0.5.1 pillar_1.10.1
#> [71] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13
#> [73] R6_2.6.0 vroom_1.6.5
#> [75] evaluate_1.0.3 lattice_0.22-6
#> [77] Biobase_2.66.0 R.methodsS3_1.8.2
#> [79] Rsamtools_2.22.0 bslib_0.9.0
#> [81] Rcpp_1.0.14 SparseArray_1.6.1
#> [83] xfun_0.50 MatrixGenerics_1.18.1
#> [85] pkgconfig_2.0.3